Asking the right question (or_where would we be without Ecto upgrades)

Peter Hastie
Bleacher Report Engineering
6 min readDec 5, 2016

--

The most recent release of Ecto, version 2.1.0, includes a new query macro or_where. This adds the ability to dynamically build WHERE x OR y clauses of an arbitrary length. This might not sound like a major improvement, but for one of the services in Bleacher Report’s data platform, it drastically simplified our code.

One of the limitations of Bleacher Report’s old data platform is that we had no way of matching up events sent from a user’s devices. If a user browses the site on their laptop, those events would be labelled differently from those sent from their Team Stream application. As we look to use these events to personalize a user’s Bleacher Report experience, consistency becomes more important.

To make this possible we created a very lightweight service called Portmeirion to dispense unique user IDs in response to the client sending over a set of identifiers, such as their device ID. If the identifiers have been seen before, Portmeirion will treat this as a request from an existing user and return the existing user id. Clients are expected to request a user ID before they send any events and to include that ID in all events they send.

As with all of our new user-facing services, Portmeirion is an Elixir project built on Phoenix. The service exposes an extremely lightweight API with just two endpoints: one for clients to request a user ID, another for clients to change the state of their user ID when a user logs in. The identification endpoint can accept a variety of identifiers to search for an existing ID for this user. These originally included login_id, device_id, and prism_id (a tracking cookie from our previous data platform), and we later added facebook_id. Some, all or none of these identifiers can be present in a request, and there is an informal hierarchy: If we can’t identify a user by login_id we will try their device_id and finally their prism_id.

My first attempt to match identifiers to a user made use of pattern-matched functions and short-circuiting logical OR operators. If there is no login_id present in the parameters, a call to find_user_by_login will match the definition that returns nil, causing the find_user_by_device function to be called. However, if login_id is present and a matching record is found, find_user_by_device will never be called.

This initial implementation was easy to reason about, but each of the find_user_by_* functions makes a separate call to the database. We think most of our users are not logged in, so most calls to this endpoint will result in at least two database calls. I wanted to rewrite this as a single query, preserving the hierarchy by ordering the results with login_id at the top. In SQL, the query would have the following format and is even simpler if device_id is not present in the parameters as there is then no need to join the devices table:

However, this query is not simple to recreate with Ecto 2.0.1, the version installed with the current stable release of Phoenix, version 1.2. Ecto’s where macro can accept a list of parameters as we have here, but it will always combine these with AND rather than OR. To correctly build the where clause, we have to specify the OR manually. Some of our lookups may include login_id, device_id and prism_id, but others may include just login_id and prism_id or some other combination. Because we are explicitly defining the where clause to combine parameters with OR, we have to account for each combination of identifiers.

I knew those seven where_clause functions would be a pain to maintain. When we added facebook_id as another parameter, I needed to find a better solution. Persisting with this approach would cause those seven functions to balloon to fifteen just to handle one more identifier. One optimization would be to handle all combinations of the same number of parameters in one function. For example, whether the parameters contain login_id and prism_id, or facebook_id and prism_id, they should be handled by one function that takes a list with two elements. The function would need to determine the field names at runtime based on the parameter keys. I chose to use Ecto fragments to build these clauses.

Fragments are a powerful feature of Ecto that allow us to inject SQL directly into our Ecto queries. Once I discovered these, I expected I would be able to compose my query string from an arbitrary number of parameters, removing the need for multiple where_clause functions entirely. However, fragments will not accept a string passed from a variable — the strings must be defined at compile time. This helps reduce the risk of introducing SQL injection vulnerabilities to the code. I was surprised by this, but I think it shows how thorough the Ecto team have been in designing for both flexibility and security.

So although fragments didn’t eliminate the need for multiple functions to handle lists of different length, by determining the fields and values at runtime I was able to design a set of functions that only expands linearly as we add more parameters. For each list size, two functions are needed as we query different tables if device_id is present.

This solution is more succinct and, with fewer functions, is probably easier to maintain, but it is not easy to tell what is happening at a glance. As an alternative, I considered building a raw SQL string and executing this with Ecto.Adapters.SQL.query, but this increases the likelihood of introducing vulnerabilities and adds complexity to convert the result to the appropriate struct. I also saw some suggestions online that the problem can also be solved by writing a custom macro to generate the appropriate fragments.

In my search for solutions, I was very excited to hear that Ecto 2.1 adds support for these types of queries with the or_where macro. Like the where macro, this can take a list of any length to build the where clause, but this time it will combine them with OR — exactly what I was originally hoping to do.

Ecto 2.1 is still at the release candidate stage, but hex.pm now lists this as the most current version. This is not true on hexdocs.pm, though, so be sure to include the version number in the URL when exploring the API: https://hexdocs.pm/ecto/2.1.0-rc.3/Ecto.html. In order to upgrade my project, I modified my mix.exs file to override the existing version of Ecto that comes with Phoenix 1.2.

Thanks to this amazing new feature, the final version of my query is also the simplest and will not need to change as we add more identifier fields to the users table.

In the case where device_id is not present, you can see that it is possible to use or_where without an initial call to where. In the case where device_id is present, I had to move the device_id lookup to a separate where call, as the or_where macro attempts to find that field on the users table. This is one restriction that means that or_where is not a magic bullet for all complex queries of this nature, but it is nevertheless a welcome addition.

We have been running this version of the code, using Ecto 2.1.0-rc.3, in production for several weeks without issue. This change reduced our 50th percentile response time by several milliseconds, although whether this is directly related to the query rewrite or efficiency improvements in Ecto is not clear.

Red bar shows deploy of Ecto upgrade and or_where changes. Time units are microseconds.

Throughout this process I was very pleased to have a comprehensive test suite to ensure I wasn’t introducing any regressions with these multiple rounds of changes. I was also glad that we have kept Portmeirion’s dependencies up-to-date such that adding the latest version of Ecto required very few changes beyond the deprecation of cast/4 in favor of cast/3 and validate_required in the model changesets. Given how fast the Elixir ecosystem is growing, it can be a challenge to keep up with changes in your dependencies, but sometimes those changes are exactly what you need.

--

--